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ABSTRACT 

The effects of variations in degree of range restriction and different 
subgroup sample sizes on the validity of several item bias detection 
procedures based on Item Response Theory (IRT) were investigated in a 
simulation study. The degree of range restriction for each of two 
subpopulations was varied by cutting the specified subpopulation ability 
distribution at different locations and rete^ining the upper portion of the 
distribution. It was found that range restriction did havi? an effect on 
the accuracy of the bias detection procedures. The signed area index was 
least influenced by variations in range restriction, whereas the base low 
area index was found to be invalid regardless of the degree of range 
restriction. The findings for variations in sample size were mixed. 
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A Simulation Study of the Effects of Ability Range Restriction 
On IRT Item Bias Detection Procedures 

Numerous statistical techniques have been proposed for detecting test 
item bias (Green and Draper, 1972; Angoff and Ford, 1973; Wright et al., 
1976; Rudner, 1977; Wright and Stone, 1979; Scheuneman, 1979; Camilli, 
1980; Lord, 1980; Linn, Levine, Hastings, and Wardrop, 1981; Hulin et al., 
1982). Many of these techniques pinpoint bias at the ite'n level. The 
nature of these techniques has led some psychometr icians to refer to them 
a:: item bias detection method?. Some methods are adapted from classical 
test theory (CTT) and others from item response theory (IRT). CTT based 
techniques are theoretically problematic, as most of m have relied on 
sample dependent item statistics from CTT. The most frequently used CTT 
index, the transformed item difficulty approach, uses the CTT item 
difficulty index, p.. Although previous investigators have attempted to 
control for the sample dependence property of the statistic, this has not 
proven entirely successful (Ironson, 1982). However, the techniques based 
on item response theory are said to be theoretically superior to CTT ones. 
The sample free quality of IRT item parameters tend to make those methods 
of item bias detection less sensitive to distributional differences in 
subpopulation samples (Lord, 1980; Shepard, Camilli I Averill, 1981; 
Shepard, Camilli & Williams, 19B^). 

All of the item bias techniques developed to date, whether IRT based 
or CTT based, are dependent on information internal to the test in 
question. These techniques establish the standard of unbiasedness by 
using either the total score on the test or the estimated trait score 
based on the responses to the items on the test. Bias in one item cannot 
be identified without considering information from other items on the 
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test. Bias is identified when an item exhibits different characterislics 
than the rest of the items in the test (Shepard, 1982; Burrill, 1982). 
The appropriateness of an item to the trait assumed to be measured by the 
test is not addressed by these techniques (Petersen, 1977; Shepard, 1982; 
Shepard, et al., 1985) • The techniques only serve to make a test 
homogeneous, whether or not the test thus constructed measures what we 
want it to measure. Many authors have suggested that both judgmental an! 
empirical methods must be used in combination. 

An IRT model provi-^s a probabilistic way of linking individuals' 
item responses to the latent characteristic assumed to underlie those 
responses. IRT models make use of an item characteristic curve which 
depicts the relationship between the probability of a correct response to 
the item and the latent characteristic. For several reasons the logistic 
IRT models are preferred in most practical applications of IRT. The 
three-parameter logisric model is mathematically given by: 

(1) P. (9) = c + (1 - c.) / { 1 + expC-1.7a. (9 - b.)3} 
ill 11 

where P. (9) is the probability of an examinee with a given level of 

ability on the latent trait 9 answering item i correctly, b. is the 

difficulty of item i, a. is the discrimination index for item i, c is 

* i 

the lower asymptote for item i and serves as a baseline for guessing. 

The above equation states that probability of success on an item 
depends on nothing but three item parameters and examinee ability B. If 
the model holds true, a person's trait B is all we need in order to 
determine his/her orobability of success on any given item (Lord, 1980). 
In other words, individuals who have the identical value on the trait 
dimension must have an equal probability of getting a specified item 
correct, regardless of their subpopulat ion group membership. 
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According to the model, therefore, the item parameters remain the 
same regardless of the dis»:r ibut ion of the trait in the subpopulation 
tested. This notion of parameter invariance is a basis of all IRT bias 
detection methods, and a finding of parameter variance would mean bias 
exists. The invariance property can be understood by interpreting ICCs as 
nonlinear regression lines (Hulin et aK, 19B3; Lord, 19B0). Note that 
although IRT item parameters are invariant across subpopulations, item 
parameter estimates will not be necessarily identical when calculated in 
different samples. Since the choice of origin for the trait scale is 
purely arbitrary, the invariance of item parameters holds true only if the 
origin and unit of the trait scale is the same. The estimates can be 
placed on the same scale via an appropriate linear equating 
transformation. 

Factors Affecting IRT Parameterization 
Validity studies are often problematic due to many artifactual 
factors, such as small sample size, restriction of range in the sample, 
and criterion unreliability. These problems may be also exist when item 
bias research is performed in various decision-making contexts. Ideally, 
item bias studies can be incorporated at the test construction stage to 
minimize the chances of bias accusation arising later (Berk, 1982; 
Drasgow, 1982). Consequently, it is important tc understand what factors 
can poten.ially affect the validity of the IRT bias detection techniques. 

Other researchers have shown that CTT-based chi-square methods are 
sensitive to cutoffs for ability intervals, sample sizes, and the 
distribution of the total test scores. Nungster (1977) and Rudnt?r (1977) 
demonstrated thr^t the r.hi-square values can become quite inflated when the 
total observed score distributions differ. Baker (1981) also noted that 
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the Scheueman chi-square procedure is confounded by unequal sample sizes 
for the two subpopulations. 

There are few flonte Carlo or empirical studies of this kind for IRT 
bias techniques. It is clear that sample size and test length are 
important to some extent because they joinMy influence the stability of 
estimates of person and item parameters (Hulin et al.j 1983) • 
Furthermore, it is not unreasonable to expect that true ability 
distributions, like observed score distributions, may be highly skewed if 
not truncated in numerous applied settings. 

As noted previously, parameter estimates in IRT remain the same 
across subpopulations within a linear transformation. It should not be 
assumed, however, that parameters can be estimated with equal accuracy on 
all the subpopulation samples of interest (Ironson, 198a), As a matter of 
fact, estimation of person and item parameters are an important problem 
encountered in applications of IRT models. Since their true values are 
unknown, they must be estimated simultaneously. LOGIST uses an iterative 
procedure alternating between trait estimates and item parameter 
estimates. The iteration is continued until both item and trait paramter 
estimates converge, that is, until the estimates change by an arbitrarily 
small value from the i to i*^ + 1 iteration (Hulin et al,, 1983J, 

Wright (1977) argued that irresovable problems arise for some IRT 
models when item and trait parameters must be estimated simultaneously 
(Hulin et al, 1983), However, Hulin et al,(1982) showed that 
simultaneous estimation of trait and item parameters using an iterative 
procedure such as LOGIST may be sufficiently accurate for many practical 
applications of IRT, This sufficient accuracy, however, is obtained 
within constraints of sample size and number of items. Although specific 
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sample size and test length depend on which parameters one wishes to 
estimate accurately and/or the purposes for which IRT is used for, 
Swaminathan and Gifford (19B0) and Lord (19B0) recommended using as many 
as 50 items and 1,000 persons for the recovery of item parameters. 
However, Hulin, Lissak, and Drasgow (1982) have investigated the recovery 
of an ICC, finding that a test as short as 30 items with a sample size of 
1,000 examinees or 60 items and 500 examinees for the three-parameter 
data, appear sufficient for accurate estimation of ICCs. 

Criterion-related validity studies are limited by range restriction, 
because they need external criterion scores as we)) as predictor test 
scores. Since item bias studies do not need external criterion scores, it 
is expected that the situations where the effect of the range 
restrictions occur jn the validity studies are not necessarily parallel to 
the ones in IRT item bias research. It is not unlikely that range 
restriction on ability, direct or indirect, would occur in many practical 
situations. 

The purpose of the present study is to investigate the effect of 
range restriction in the subpopulation groups on the validity of item bias 
detection procedures. A type of range restriction similar to that which 
might be encountered in selection/training contexts provided the focus for 
the study. The restricted ranges of traits for each subpopulation were 
obtained by arbitrarily cutting the specified population ability 
distributions and taking the upper part of the distributions. 

The sample si^e in th^ subpopulation comparison groups was also 
manipulated. Although sample size and test length should be considered 
together for accurate estimation of IRT parameters, requirements for test 
length are not as severe when compared to the sample size requirements. 
Hulin et al. (1982) indicated that for research involving the comparison 
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of ICCs, such as item bias studies, large numbers of items are not nec»ded. 
However, large numbers of subjects are necessary. Therefore, in the 
present investigation test length was fixed at 50 items and only 
sample size and range restriction were varied. Hulin et al. (1982) 
reported that there were tradeoffs between test length and sample size. 
This might also be true of sample size and d9gree of range restriction. 

The second question investigated in the study concerned estimation 
accuracy of the item response function (IRF). All of the IRT based bias 
detection indices compare IRFs calculated based on item parameters 
generated separately from each relevant subpopulation. Accuracy of these 
techniques are very closely related to estimation accuracy of the IRFs. 
In effect, examining the IRF estimation accuracy for the subpopulations 
would be a desirable step before determining the efficiency of the 
techniques. In the present stduy examination of the IRFS was important in 
its own right, providing an indication of how restriction of range in 
traits affected the IRF estimation accuracy. 

In summary, the following questions were addressed in the present 
study: 

1. To what extent did restriction of range jn subpopulation simples 
affect the accuracy of the selected bias detection indices? Did 
different combinations of range restrictions for the two subpopulation 
samples produce* distinct results'? 

2. When range restriction and sample size were considered jointly, 
what were the effects in terms of accuracy of parameterization? 

3. How accurately were the IRFs estimated? 
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METHOD 
Data Simulation 

Binary response data sets were generated using a modified (i.e., the 

c-parameter was fixed) three-parameter logistic IRT model. The generation 

of responses began by constructing biased items. In order to make items 

biased, item parameters were manipulated to be subpopulation dependent. 

Two 50 item four-option multiple choice ability tests are simulated with 

differential overall test bias. Test 1 was constructed to contain ^0 

biased items and Test 2 had 10 biased Items. 

For both tests the item parameters for subpopulation 1« a and b , 

1 1 il 

were drawn from uniform distributions in the interval C+.65, +1.6] and the 

interval C-3,+3], respectively. The values of a.^ and b for 

ic id 

subpopulation 2 are created by subtracting the randomly sampled values of 
^^il " ^i2^ ^^il " ''ia^ ^^^^ ^^^^ paramters for subpopulation 1. 

In generating random values for < * ^jg the values were constrained 
to positive numbers. To avoid difficulty in estimating c parameters, 
these were arbitrarily fixed at .20 for all items in both subpopulations. 

The distribution of the latent trait was assumed to be normal with 
a standard deviation of 1.0 for each subpopulation. Subpopulation 1, 
hereafter denoted as the 'A' group had a mean ability of +.5 and 
subpopulation 2, hereafter denoted as, the 'B' group, had a mean ability of 
-.5. Test length was fixed at 50 items, because in practice length of 
test is less a problem than sample size. Within each subpopulation 
four restriction groups (labeled W, X, Y and 2) and a no resriction group 
(labeled N) of examinees were specified as follows: (W) all examinees 
with 8 above +1.5 <X) all those with 8 above +1.0 (Y) all those with 8 
above 0.0 (2) all those with 8 above -1.0 (N) no restriction. 
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The probability of a correct response to an item wa^ calculated by 
entering appropriate subpopulation item parameters and the trait values of 
simulated examinees into equation \. The probability of a correct 
response was then compared to a random number drawn from the (0, 1.0) 
uniform distribution. If the sampled number was larger than the 
probability of a correct response, then the item was scored as incorrect: 
otherwise, a correct response was specified for the item. The item 
parameters a., b., and the person parameter Q were estimated by analyzing 
the simulated data sets usin^ the LOGIST (Wingersky, Barton S. Lord, 1983) 
computer program. Default convergence criteria for LOGIST were used. 

The known simulated amount of bias in an item was measured by each of 
the bias detection indices described earlier, and this value was 
correlated with detected amount of bias measured by :he same index in the 
simulated item responses. Prelin?nary analysis showed that when the 
correlations were calculated using all 50 items, relatively low 
correlations were observed for most of the indices in the study, and the 
correlations were also less lawfully behaved with changes in degree of 
restriction. It was suspected that poor estimates of item parameters 
had hindered a clear effect of range restriction. Previous research by 
others (Swaminathan & Gifford, 1983; Hulin et al., 1982) has indicated 
that the range of b-parameter values in test items was an important factor 
that determined the accurate estimation of item parameters. Therefore, in 
addition to examining the correlations for all 50 items, the correlations 
for a subset of 27 items that were selected from the original 50 items, 
and whose difficulty parameter values were between -2.0 and +2.0, and 
hence were likely to be better estimated, were also computed. 

Within each restricted subpopulation, three sample sizes uf N = 1000, 
600 and 300 examinees were sampled. Each -estricticn group-sample si-p 
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combination for group 'A' was paired with the restriction-sample size 
combinations for group 'B' which had an identical restriction conditions 
with equal or smaller sample sizes. This made group 'A' a majority group. 
Applying these criteria, a total of 60 comparison datasets were generated 
to permit 30 comparisons of an 'A' group dataset with a 'B' group dataset. 

Equat ing 

the trait scales from an IRT analysis are arbitrary, thea and b 
values for the two subpopulations are not directly comparable. ICCs must 
first be equated to the same scale. To make the adjustment of item 
parameters from different subpopulations, a linear transformation of the b 
parameters as described in Linn, et al. (1981) was used. The equating is 
determined by a best fitting line that adjusts for the difference in 
average in b-parameter values and has a slope equal to the ratio of the 
standard deviations of the two sets of b's (Shepard et al., 198'*). Linn 
et al. (1991) selected equating constants so that the weighted mean and 
variance of the b's in the comparison group were equal to the weighted 
mean and variance of the b's in the base group. The weight for each item 
was determined by taking the inverse of the larger sampling variance for 
the b-parameter from either the base or comparison group. 

Estimat.on Accur.-'cy of Item Response Function 
As noted previously, all of IRT based bias detection indices compare 
IRFs calculated from the relevant subpopulations. Accuracy of these 
techniques are very closely related to the estimation accuracy of IRFs. 
In effect, examining the IRF estimation accuracy for the subpopulation 
groups would be a desirable step before determining efficiency of the 
techniques. If IRFs proved poorly estimated, but bias detection was 
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accurate, the accuracy might have been obtained by chance or the 
efficiency of the techniques might be questionable. This examination would 
be also important in its own right, giving an indication of how 
restriction of range on a trait affected the IRF estimation accuracy. 

As noted by Hulin et al. (1983), the primary emphasis in a study of 
item bias is on the accuracy of the estimation of ICCs. Although a close 
relation between parameters and parameter estimates can indicate good 
estimation, this criterion may be overly stringent for a study of bias. 
They suggested the statistic, the Root Mean Square Error (RMSE) for use in 
investigating the recovery of ICCs rather than the recovery of item 
parameters. In the present study this statistic was used as an indsx of 
IRF estimation accuracy. The index is expressed in the following equation 
as: 

31 

RMSE = { 1 / 3lT]c P.(9 ) " P.(9 ) ]^ }^^^ 
where P. (9) is the true ICC for item i; P. (9) is the recovered ICC. 
Thirty-one values for 9 were chosen at equal intervals from -3 to +3. 
RMSEs were averaged over all 50 items in each range restriction sample 
size combination. 

Bias Indices 

Hambleton and Swaminathan (1985) divided procedures for assessing 
item bias using IRT methods into three categories involving, comparisons 
of ICCs, comparisons of vectors of item parameters, and comparisons of the 
fit of the IRT model to the data. Since the data are simulated, the RMSEs 
were used to address the last concern. To address the other two general 
procedures, the following item bias indices were computed: 
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1 . Area Indices 

The Ability continuuin was diviJeJ f i um -3 to +3 into 600 intervals (Linn 
et al., 1981). The absolute value of the differences of the ICCs given 
by: 

D. ^Ip.,(9.) - P.^(9.)| 
at the midpoints of these 600 intervals are multiplied by the width of the 
interval, .01 in this case. 

a. ) Base-High Area (BH): BH =2(.01)#k#D 

b. ) Base-Low Area (BL): BL = 2 ^ .01 )*( 1-k )#D 

J*' J 
where k = 1 if P > R , and k = 0 if P., < P ^ 

c. ) Total Area (TA): TA= BH + BL 

d. ) Signed Area (SA): SA= BH - BL 

e. ) Root Mean Squared Difference (RMSD): 

RMSD = { 1/600 Zj CP- J0 ) - P-^O >^'^^ 
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2. Differential Parameter Indices 

Lord's chi-squared statistic was calculated for testing the null 

hypothesis that for a particular item i both b. , = b.« and a = a . The 

il lE il i2 

chi-squared statistic is 

where v ' = (b - b , a - a ); S is sampling var iance-covar iance 
1 11 ic 11 ic— il 

matrix of a and b in group 1, and similarly for S.^. S., and S ^ are 
li 11 — i2~il — lE 

found for maximum likelihood estimators fro/ti the formulas S. = I and 

-1 1 -i I 

S-p i-^f where _I is the 2x2 information matrix for a and b . The 

ic IC 1 J J 

significance test was carried out separately for each item by computing 
and comparing the chi-squared .alue to the critical value with 2 degrees 
of freedom (Lord> 1980). 
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RESULTS 
Identification of Bias 

The six item bias indices were calculated for each item in each of 
the 60 comparison conditions created by employing the 5 levels of range 
restriction, 6 levels of group sample size combinations, and 2 levels of 
the number of biased items. The amount of "true" simulated statistical 
bias was calculated using each of the bias detection indices in the study 
and was in turn correlated with the detected bias measured by the same 
bias index for each sample condition. One exception was that the bias 
estimates computed using the chi-squsre index were correlated with the 
"true" bias measured by the total area index. ^ 

The results from these analyses are presented in Tables 1 through 6, 
one table for each of the six bias indices used. The results are 
presented for a test with 10 biased items and for a test with ^0 biased 
items. Within each test results are presented for all 50 items, and 
separately for the 27 items that had population b-parameter values that 
ranged between -2.0 and +2.0 (which presumably would have had led to 
better item parameter estimates in the samples). The restriction 
conditions and the sample sizes for the A and B groups are given in the 
left column in each table. Three blocks of results are presented as 
separate rows for which the A group sample sizes are fixed and paired with 
variations in the B group sample size for each possible range restriction 
condition* as discussed above. 

Total Area Method 

Test with 10 biased items 

Table 1 presents correlations between the generated amounts of bias 
and detected amounts of bias, which were derived from the total area index 
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for the test that had 10 biased items. The results for equal and unequal 
sample sizes are discussed in separate sections below. 
Equal sample size for both groups 

For both 50 item and 27 item test correlations between the generated 
and detected amount of bias steadily decreased with an increase in range 
restriction in ability distribution of the samples with N=1000. This is 
also true for the condition with sample sizes of N=600 for both groups, 
The correlations between true bias and bias estimates for samples of N= 
300 examinees fluctuated with an increase in restriction, but still 
indicated that there was an effect of range restriction on the detection 
accuracy of the index. 

Unequal sample sizes 

The results for 50-item test indicated that the index was not 
influenced by range restriction in the samples across different 
combinations of sample sizes. However, the correlations computed on 27 
items whose estimated difficulties ranged beween -2.0 and + 2.0 rose with 
a decrease in range restriction, especially for the 1000-600 combination, 
ihe Z restriction condition correlations were compa-able to those of no 
restriction conditions. Interestingly enough, the W restriction condition 
resulted in hiyh correlations across sample size combinations. 

Test with ^0 biased : !- Pmg 

Equal sample sizes for both groups 

The right-most columns of Table 1 present the results from the ^0 
biased item test conditions. It was expected that since biased items 
contribute to poorer estimation of abilities il might be expected that 
correlations for the item bias measures would be lew. However, this was 
not what was found. The correlations tended to be only slightly lowe^- in 
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general than in the test with 10 biased items. It was observed that as 
range restriction became more severe, correlations between the generated 
and detected aiTiount of bias decreased when both samples had N=1000. In 
the equal sample size conditions where N=600 and N=300 the relationships 
between range restriction in the sample of examinees and the efficiency of 
detecting item bias did not yield a consistent pattern. Correlations for 
no restriction conditions were extremely low. This indicates that under 
the no range restriction conditions the total area index can actually be 
less accurate. Using a sample size of 600 for each of the two comparison 
groups, there was no clear indication that range restriction had a 
systematic effect on the index in terms of accuracy. When N=300 and the 
sample were restricted to theta above 1.0, the detection techniques were 
most effective. It is hypothesized that this result was due to large 
overall bias in the test. The correlations did not improve for the 27- 
item version of the test. This also may result from the severe distortion 
of the ability dimension due to too many biased items. When sample sizes 
equaled 300, little was learned about the effect of range restriction on 
the index. 

Unequal sample sizes 
There was little evidence that the Total Area Method is sensitive to 
range restriction of ability distributions, when groups are of unequal 
sample sizes. However, correlations were the lowest when no restriction 
was imposed on the ability distributions. In the extreme condition (W), 
relatively high correlations were observed. This was trca with the both 
50-item and E7-item tests. 
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Base-High Area Method 

Test with 10 biased items 

Equdl sample sizes for both groups 

In Table 2 it can be seen that for equal sample sizesi when N=1000 
there was a substantial difference in the values of correlations between 
the conditions X and the conditions Y, Z and N, Correlations for the 
50-item and 27-item tests were somewhat stable in the Y, Z, N conditions, 
indicating that the base-high area index was somewhat robust to restricted 
range in abilities. When sample sizes were equal to N=600, the trend in 
change was similar to that in sample sizes of N=1000 for the 50-iti?m test- 
However, in the 27-item test with N=600, the index was extremely accurate 
in detecting item bias with a correlation of 0.9^^ under no restrictior. 
condition. Although relationships between the bias estimates and known 
bias were quite high and stable across restriction conditions, the 
difference in the magnitude of correlations between the groups in the 
presence and absence of restrictions was large. With 300 examinees in 
each of the groups, when there were rar.ge restrictions more severe than 
condition Z in the samples, the index appears to be invalid. Z proved to 
be a very tolerable condition in estimating degree of bias in items by the 
index . 

Unequal sample sizes - 

The effects of having restricted samples for the two comparison 
groups seems to be clear for both 50-item test and 27-item test. As was 
the case with Total Area Method howev«r^r, the correlations for the 50-item 
test were relatively low even in the absence of range restriction in 
samples. Therefore, the index may not always have high validity with 
respect to discovering the presence of bias in itens. With ^7 items whose 
simulated values of difficulty parameters lie between -2.0 and +2.0, there 
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was a very clear tendency of the magnitude of detectablity to become 
larger as the sanples became less restricted in range. Interestingly, the 
correlations in the Z restriction conditions were the highest and higher 
than those in the no restriction conditions in both the 1000-300 and 600- 
300 sample size combinations. This suggests that not having individuals 
of very low ability in a sample mc' ✓ enhance detection of biased items. 

Test with ^0 biased items 

Equal sam ple sizes for both groups 

Since biased items contribute to the estimation of person parameters, 
it was expected that the degree of range restriction would be of little 
systematic influence on the bias detection accuracy of the index in the 
samples (see Table This is not necessarily true. The outcome was 

similar to that obtained in the test which had 10-biased items in terms of 
magnitudes of correlations associated with each restriction condition. 
However, for the samples of N=1000 the correlations obtained in no 
restriction conditions were much lower than those of 2 conditions. In N 
=300 for both groups there was a constant decrease in the detectabi 1 i ty 
of the index with an increase in the degree of restriction of range in the 
sample. 

Unequal sample sizes 

There seemed to be a similar trend of the range restriction effect 
on the accuracy of the base-high index for unequal sample sizes, A 
substantial increase in the values of correlations was observed between 
the X and Y range restriction conditions. 
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Base-Low Area Method 

Test with 10 biased items 

Equal sample sizes for both groups 

The Base-Low method resulted in near zero correlations for both 50- 
item and 27-item tests (see Table 3). Although the index had one 
correlation of 0.^5 in N=600, when a Spearman correlation was computed for 
this condition the obtained value was zero, and thus this result may be 
attributed to chance. 

Unequal sample sizes 

The results were essentially the same as for the equal sample size 
conditions. All of the correlations were essentially zero, with no 
indication of range restriction effects. 

Test with ^0 biased items 

Equal sample sizes for both groups 

It was exppcted that the test with ^0 biased items would have a 
similar outcome to the 10 biased item test, since biased items will 
distort the ability dimension and the index appeared to be very invalid 
for these particular data sets. Again, surprisingly, this was not the 
case. The resulting correlations were much higher than those in the test 
with 10 biased items, and there was an indication that correlations in the 
more restricted condition were lower than in the less restricted 
conditions. As was the case with Base-High method, the largest change in 
the value of the correlations occurred between X and Y conditions when 
N=1000 for both groups. The correlations wf?re lower in the 27-item test 
than in the 50-item test, and the correlatjons obtained in N=600 and N=300 
were close to zero. 
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Unequal sample si?es 

Results obtained on 50-item and 27-item tests indicated no 
systematic influence of the restriction in range of examinee's traits. 
This was not true in the 1000-300 sample size combination for the 27-item 
test; the correlations show an increment with a decrease in the degree of 
range restriction in samples. 

Signed ftrea Index 

Test with 10 biased items 

Equal sa mple sizes for both groups 

The results presented in Table ^ reveal that most correlations were 
relatively high. However, clear increasing trends in the magnitude of the 
relationship between the estimated bias and known bias were not present 
as degree of the restriction increased. Instead, correlations remained 
relatively stable across restriction conditions. In contrast with the 
previous item bias indices discussed above, the correlations in the no 
restriction conditions tended to be fairly high. Focusing on the subset 
of 27 items, the correlations consistently increased over those obtained 
when all 50 items were used. All oT the correlations are impressively 
high across the restriction conditions, indicating that this method is 
robust to the range restriction. One exception is when the sample sizes 
in both groups A and B are 300, then increasing range restriction tends to 
lead to a decline in the correlation. 
Unequal sample sizes 

As in the equal sample size condition, magnitudes of the correlations 
were generally the highest when examinees were sampled from a larger range 
of the ability distribution. The detection accuracy was quite high even 
in the most severe restriction case for the 27-item version of the test. 
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Test with ^0 hiAgPd items 

Equal sample sizes for both orouos 

Across the restriction conditions, the magnitude of the correlations 
were c;omPwhat uniform ;^nH high, as was true with the test having 10 biased 
items (see Table The correlations were on the average higher with ^0 

biased items in the test than with 10 biased items in the test. 

Unequal sample sizes 

Differences in the degree of the restriction did not appear to 
influence the efficiency of the method with respect to accuracy. The 
results were very similar for the 50-item and 27-item tests, and again the 
correlations were somewhat higher for the 27-item version. When N=1000 
for the A groups and N=300 for the B groups, the method detected bias in 
items under the U restriction condition as accurately as under the no 
restriction condition. In summary, it may be concluded that Signed Area 
Method was robust to the number of biased items in the test, differences 
in sample sizes and the degree of range restriction in the samples. 

RMSD Method 

Test with 10 biased items 

Equal sample sizes for both groups 

Based on the results presented in Table 5, restriction in range had 
no consistent systematic impact on the RMSD item bias technique when 
considering all 50 items. However, the 27-item version yielded results 
more consistent with the expectation that increases in range restriction 
would lead to lower correlations. The correlations dropped sharply when 
the more severe restrictions were imposed on the range of the samples, and 
was most clear when N=300 in both groups. 

for both groups A and B there was a large difference in the values of the 
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correlations in comparing the no restriction condition with the moderate 
restriction conditions Y and Z. There was fluctuation in the magnitude of 
the relationship between the generated and detected amount of bias among 
the restriction conditions. 
Unequal sample sizes 

There was no clear indication that the detection accuracy of the RSMD 
bias index was systematically influenced by restriction in range. The 
correlations under the U restriction condition were as high as under the 2 
or the N (no restriction) conditions. However, it cannot be concluded 
that the method was robust to severe restriction condition because 
correlations were lower under X and Y restriction conditions. 

Test with ^0 biased items 

Equal sample sizes for both groups 

There was no indication that when the distribution of the examinees 
of the two group samples was more restricted in range, the technique 
failed in discovering bias in items for 50 item test with N=1000 and N= 
600 (see Table 10). with sample sizes of N=300, differences in 
correlations between the restriction and no restriction conditions were 
substantial. Across different sample size conditions, the index was the 
least accurats method of detecting bias in items with both the 50-item 
test and the E7-item test in the absence of range restriction. Except in 
the no restriction condition, the correlations for the E7-item test 
exhibited a systematic decrement as the restriction increased for any 
sample size combination. 
Unequal sample sizes 

C'.z was the cast? with tf.e identical sample sizes condition, th^ range 
restriction in samples did not seem to affect the detection accuracy of 
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the index. The correlations in the severest restriction condition, W, were 
much higher than those in the no restriction condition. Perhaps the 
number of biased items in the test would render results uninterpretable. 
In general, th« RMSD index performed differently when there were 10 biased 
items ^n the test than when there were ^0 biased items in the test. 

Chi-Sguare Index 

Test with 10 biased items 

Equal sample sizes for both groups 

With 27 items that have a value of difficulty parameter between -c 
and +2, the Pearson correlations between the estimated and simulated bias 
showed a steady decrement as the degree of the ranq^ restriction on the 
ability distributions grew larger (see Table 6). The highest correlation 
value was obtained with 27 items under no restriction condition when the 
number of the examinees were 600 for the two groups. The restriction was 
also observed to affect the index for 50 item test in terms of efficiency. 
Unequal sample sizes 

With the exception of very high correlations under W conditions and 
600-300 for 50 items and 1000-300 for 27 items combinations! the larger 
the range restriction of the sample, the more the bias detection accuracy 
of the technique deteriorated. Under 2 conditions the method was as 
effective in terms of detecting bias as under N conditions. 

Test with ^0 biased items 

Equal sample sizes for both groups 

Compared to the 10 biased item test, the impact of the range 
restriction on the Chi Square method was clear, as the correlations 
fluctuated with change in the degree of the restriction. The correlations 
under the no restriction condition were smaller than under 2 condition in 
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all sample size combinations. This may be attributed either to unstable 
estimation of abilities due to many biased items, or to poorer estimations 
of the item or person parameters below an ability value of -1.5. 
Thereforef the absence of restriction may net always be the best condition 
to detect item bias. 

Unequal sample sizes 

With the 1000-300 and 600-300 sample size combinations, decreasing 
the degree of the restriction led to an increase in the correlation 
between the generated bias and detected bias for the 50 item test. This 
relationship did not occur with the 27-item test. Since all of the 
correlations were very low when two groups differed in regard to the size 
of the sample, it -seems that absence or presence of the range restriction 
in the groups does not make a practical difference in the validity of the 
Chi -Square index- 

Root Mean Square Error 
Recovered ICCs were compared to actual ICCs calculated from the 
simulated parameters at 31 theta values chosen at equal intervals from 
-3.0 to +3.0 by the root mean squared error (RMSE) for each item. RMSEs 
for 50 items in the test ^or each sample size-group combination were 
averaged. The_ results are presented in Table 7. Within each sample size, 
the average RMSEs for any group increased as the range restriction in the 
sample increased as had been expected. For the "A" group the magnitude of 
RMSE was large by comparison, indicating that there was less accurate 
recovery of the ICCs throughout the range restriction conditions. For 
the "B" group, which was assumed to have taken a test that contained 
either 10 biased items or ^0 biased items, the values of RMSEs under Z and 
N conditions were below 0.06 across the sample sizes except for the 2 
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Condition with N=600. These RMSE values indicate quite accurate recovery 
of ICCs in the study, although it is apparent that range restriction 
affected parameterization. 

CONCLUSIOMS 

The major purpose of the present study was to investigate how range 
restriction in the ability distribution of the sample influences the 
detection accuracy of *:he several item bias detection techniques. It is 
concluded that for a test with 10 biased items, range restriction had a 
clearly systematic influence on the accuracy of three of the bias 
detection techniques. These include total area index, base high area 
index and chi-square index. The correlations between the genei ated bicis 
and the detected bias by these techniques exhibited a steady decrease with 
an increase in the degree of range restriction in the sample for the 
larger sample sizes. When these techniques were applied to the test that 
had ^0 biased items, the effects of range restriction were less 
systematic. 

The Pearson correlations between the true bias and detected bias with 
Base Low Area index were close to zero across all of the restriction- 
sample size Combinations. This was true for both the test including 10 
biased items and ^0 biased items, except for a decreasing trend in the 
correlations with an increase in the degree of rest:riction when both 
groups had the same number of examinees and were simulated to take the 
test with ^0 biased items. 

The Signed Area Index seemed to be robust to range restriction in the 
ability of sample, and was generally accurate in determining bi/sed items 
even under the severest range restriction conditions, and this did not 
vary as a function of the number of biased items in a test. 
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When sample sizes for two groups were equal, the RMSD index was 
sensitive to ability range restriction in two groups. In contrast there 
was little indication that the degree of range restriction in the group 
samples had a role in the degree of accuracy of the index when the groups 
compared had different numbers of examinees. 

When the results based on 50 items whose difficulty parameter values 
ranged between -3 and +3 sere compared with the results based on only 
tho^e 27 items whose item difficulty parameters ranged between -2 and +2, 
the effect of range restriction tended to be much clearer with the 27 
item test. This was expected because previous researchers have indicated 
that when a tes^^ does not have very difficult or very easy items, 
estimation of IRFs tended to be very accur;»te. Accurate estimation of the 
item response functions in turn might lead to more precise results of the 
effect of range restriction on the bias detection indices. When N=1000 
for both groups, the Pearson correlations between the simulated and 
detected bias tended to be lower when there was no range restriction 
as compared to the Z restriction condition where examinees were randomly 
sampled above -1.0 from the normal distribution. To examine the 
possibility of random sampling error, 1,000 ability values were randomly 
sampled for the "A" group (i.e., the no restriction conditions were 
replicated for N=1000 in both groups) and the bio- detection indices 
including Total, Base High, Signed Area, Base Low and Chi square were 
recalculated. The new values for Total index. Base High, Base Low, Signed 
Area, and Chi Square were 0.3^, 0.55, 0.09, 0.59, 0.70 respectively. The 
recalculated correlations were larger than the original ones, and a little 
larger than those for the Z Conditions presented in Tables 1-^, and Table 
6. It might be concluded that recovery of ICCs was poorer below -1.0 in 
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the initial sample, and that having no restriction in the range of the 
sample is not always best condition to detect item bias, and that having 
some range restriction may still enable one to obtain reasonably accurate 
detection of item bias by most of the bias detection indices. This may be 
somewhat consistent with the fact that persons in the extremes of the 
ability continuum have abilities that are likely to be more poorly 
estimated. Removing these cases can sometimes make the overall 
calibration more accurate. 

Root Mean Square Errors were computed to cross validate the results 
obtained for the bias detection techniques. Recovery of the ICCs became 
worse as range restriction increased regardless of the amount of item 
bias. Th.s might have led to the expectation that, since error in 
estimating ICCs shows a clearcut and systematic increment with an increase 
in the degree of restriction, effect of restriction on the bias detection 
indices would have been clear. However, this was not always the case and 
may be attributed to the observation that the accuracy of recovery of ICCs 
was not the same on the different regions of the ability continuum. 

One limitation of the present study is that restrictions were created 
by directly truncating the normal distribution and using the upper part of 
the distribution. It should be noted that when the ability values and the 
number right scores that were based on simulated responses were correlated 
for the various samples used in the study, the correlations were on 
average approximately 0.85. In certain applied settings it may well be 
that the ability (B) distribution for test-takers is .Kewed rather than 
truncated, since most organizations select people on the basis of raw 
scores. Such restriction is likely to be incidental, and not direct as in 
the present simultaion. 
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Footnotes 

1 . 

mis exception was necessitated by the fact that there is no sampling 
error within the entire subpopulation, so a true could not be 
calculated. As the total area index takes into account both slope and 
difficulty differences, it was chosen as representing similar information 
as that captured in the TL^ inr'ex. It should be noted that previous 
research has generally found the highest correlations of the index with 
the Total Area index (Shepard, et al., 1981; McCauley & Mendoza, 1985). 
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Table 1 

Correlations Between True and Detected Bias: Total Area Index 
10 Biased Items ^,0 Biased Items 



Condition 50 items 27 items 50 items 27 items 

A (n=1000) 



B (n=600) 0.32 o.a'f 0.16 



0.28 

O.O'f 
0.27 

0.1*7 



W B (n=1000) 0.05 O.K 0.35 0.15 

B (n=600) 0.16 0.32 0.32 0 27 

B (n=300) 0.^5 0.7^ 0.^9 0.58 

X B (n=1000) 0.26 0.30 0.37 

B (n=600) 0.16 0.35 ' 0.27 

B (n=300) 0.06 0.09 0.33 

Y B (n=1000> 0.36 0.68 0.^3 

B (n=600) 0.27 0.^9 0.28 0.36 

B (n=300) 0.05 0.15 o.« 0 

Z B (n=1000) 0.3^ 0.72 0.^9 0 76 

B (n=600) 0.23 0.75 -0.07 0 26 

B {n=300) 0.32 0.80 o.« 0.63 

N B (n=loOO) 0.19 0.58 0.07 0.16 

B (n=600) 0.30 0.85 0.20 0 16 

B (n=300) 0.25 0.53 o.lO 0.21 

A (n=600) 

U B (n=600) 0.12 0.25 0.10 0 55 

B 'n=300) 0.39 0.5^ o.3k 0.^3 

X B (n=600) 0.19 0.^19 0.30 0.23 

B (n=300) 0.0^ O.K 0.3^ ' 0 2'* 

Y B (n=600) 0.^9 0.71 0.61 0 6^ 

B (n=300) 0,26 0.^5 O.^^O 0 36 

2 B (n=600) 0.21 0.56 0.3^1 

B (n=300) 0.39 0.69 ' 0.39 



0.^6 

0.38 

B (n=300) 0.28 0.62 0.05 ^'^'^ 0.27 

A 8. B (n=300) 



0.08 

0.38 0.30 



^ -0.03 O.l'* 0.0^ 

X 0.09 0.12 

^ 0-0^ 0.09 0.25 -0.05 

^ 0-01 0.36 0.25 0.22 

N 0.5^* 0.82 0.13 O.S^i 

rjote. "W" was most severe range restriction condition. "N" was no 

range restriction condition. "A" indicates majority group, "B" 
indicates minority group. 
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Table 2 

Correlations Between True and Detected Bias; Base Hxgh Area Index 

10 Biased Items W Biased Items 



Condition 
A (n=1000) 



50 items 



27 items 



50 items 



27 items 



W B (n' 

B (n: 

B (n: 

X B Cn: 

B (n= 

B (n: 

Y B (n= 



B (n^ 

B (n= 

2 B (n= 

B (n= 

B (n= 

N B Cn= 

B (n= 

B (n= 

A (n=600) 



= 1000) 
=600) 
»300) 
= 1000) 
=600) 
=300) 
=1000) 
=600) 
^300) 
^1000) 
^600) 
300) 
1000) 
600) 
300) 



0.16 



0.22 



0.^3 



0.5^ 



0.^6 



0.16 
0.18 

0.12 
0.06 

0.36 
0.20 

0.5^ 

0.52 
0.^8 



0 51 



0.^3 



0.81 



0.89 



0.8A 



0.^9 
0.^7 

0.52 
0.38 

0.68 
0.^1 

0.81 
0.93 

0.95 
0.77 



0.20 



0.22 



0.3^ 



0.67 



0.38 



0.19 
0.07 

0.13 

OAk 

0.21 
0.38 

0.17 
0.66 

0.^6 
0.38 



0.22 



0.27 



0.^f7 



0.92 



0.59 



0.22 
0.15 

0.19 
0.16 

0.27 
0.^6 

0.30 
0.81 

0.58 
0.66 



W 



B 
B 
B 
B 
B 
B 
B 
B 
B 
B 



(n=600) 
(n=300) 
(n=600) 
Cn=300) 
(n=600) 
(n=300) 
(n=600) 
(n=300) 
Cn=600) 
(n=300) 



0.18 



0.36 



0.39 



0.16 



0.07 



0.22 



0.57 



0.38 



0.53 



0.58 



0.69 



0.9^ 



0.37 



0.38 



0.^0 



0.90 



0.81 



0.16 



O.l^f 



0.^7 



0.51 



0.39 



0.08 



O.Ik 



0.k3 



0.55 



0.33 



0.25 



0.22 



0.58 



0.67 



0.53 



0. 15 



0.1^ 



0.50 



0.63 



0.66 



A Sr B (n=300) 



W 
X 
Y 
Z 
N 



0.06 
0.02 
0.06 
0.32 
0.62 



0. 19 
0.30 
0.37 
0 82 
0.86 



0.09 
0.12 
0.15 
0.37 
0.39 



0.21 
0. 10 
0. 13 
0.^5 
0.67 



Note- "W" was most severe range restriction condition. "N" was no 

range restriction condition. "A" indicates majority group, "B" 
indicates minority group. 
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Table 3 

Correlations Between True and Detected Bias: Base Low Area Index 

10 Biased Items i^o Biased Items 



Condi t ion 
A (n=1000) 



50 items 



27 items 



50 items 



27 items 



W B in 
B in 
B (n 

X B (n 
B in 
B (n 

Y B (n^ 
B (n' 
B (n' 

2 B (n' 
B (n= 
B (n= 

N B (n« 
B (n= 
B (n= 



=1000) 
=600) 
=300) 
=1000) 
=600) 
=300) 
■1000) 
=600) 
^300) 

aooo) 

^600) 
=300) 
1000) 
600) 
300) 



A (n=600) 



W 



B 
B 
B 
B 
B 
B 
B 
B 
B 
B 



(n=600) 
(n=300) 
(n=600) 
(n=300) 
(n=600) 
(n=300) 
(n=600) 
(n=300) 
(n=600) 
(n=300) 



-0.15 



-O.O^f 



0.09 



0.00 



0.06 



-O.U 
-0.02 

0.01 
-0.02 

0.03 
-0.07 

-0.00 
0.22 

0.03 
0.06 



-0.22 



-0.09 



0.08 



0.02 



-0.11 



0.02 



0.^^6 



-0.19 



0.26 



0.03 



-0.22 



-0.27 



-0.01 



-0.19 



-0.12- 



0.06 
0.18 

-0.17 
-0.22 

-0.16 
0.07 

-O.OG 
0.13 

0.08 
0.09 



-0.19 



-0.13 



0.21 



-0.17 



-0.15 



-0. 10 



0. 16 



0.1^ 



0.15 



0.28 



0.36 



0.55 



0.50 



0.26 
0.01 

0.^6 
0.28 

0.07 
0.39 

0.25 
0.66 

0.53 
0.39 



0.17 



0.56 



0. 11 



0.38 



0.50 



-0.11 



0.36 



0.32 



0.38 



0.39 



-0.28 



-0.05 



0.35 



0.^7 



0. 19 



-0.20 
-0.22 

0.06 
0.16 

-0.11 
0.38 

O.O^f 
0.^9 

0.16 
0.23 



0.06 



0. 16 



-0.08 



0. 11 



0.25 



-0.07 



0. 18 



O.U 



0.20 



0.29 



ALB (n=300) 



W 
X 
Y 
Z 

N 



-0. 18 
-0.15 
0.17 
0.08 
-0.06 



-0.26 
-0.15 
0.08 
O.U 
-0.01 



-0.05 
0.27 
0.33 
0.^7 
0.29 



-0.09 
0.12 

-0.16 
0. 17 
0.01 



no 



Note. "W" was most severe range restriction condition. "N" was 

range restriction condition. "A" indicates majority group, ''B' 
indicates minority group. 
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Table t* 

Correlations Between True and Detected Bias: Signed Area Index 

10 Biased Items i^o Biased Items 

^"^ S° i*^""^ 27"temr' 

A (n=1000) 

W B (n=1000) 0.^2 o.6l 0.65 0 71 

B (n=600) o.M 0.6^ 0.68 0.71 

B (n=300) 0.51 0.7O 0.^7 0.63 

X B (n=1000) 0.59 0.71 0.62 0.6^ 

B (n.600) 0.^9 0.76 0.56 0.66 

" '"=3°°' 0-37 0.61 0.50 0 -53 

Y B (n=1000) 0.55 0.75 0.^3 0 70 

S 'n=600) 0.^9 0.67 0.^5 " 0 ^9 

B (n-300) 0.35 0.^5 0.70 0.'7^ 



2 B (n=1000) 0.55 



O.B'f 0.7^1 0.95 



I tnn 0-35 0.31 

^ '"=300) 0.57 0.05 0.70 0 83 

N B (n=1000) 0.54 0.77 0.63 0 72 ' 

B <"=600) 0.56 0.86 0.6^* ' 0 72 

B (n=300) 0.55 0.75 0.54 0:72 

A (n=600) 

W B (n=600) 0.51 0.64 0.59 0 64 

B (n=300) 0.53 0.67 0.39 ' 0.55 

X B (n=600) 0.45 0.76 0.54 0 59 

B 'n=300) 0.41 0.64 0.45 ' 0 44 

Y B (n=600) 0.60 0.8O 0.63 0 68 

7 n 0.69 ■ 0.69 

Z B (n=600) 0.50 0.65 0.69 0 70 

M n 0.69 ■ 0.67 

N B (n=600) 0.56 0.88 0.59 0 70 

B (n=300) 0.54 0.77 0.53 ' 0 69 



A & B (n=300) 

" 0.11 0.19 

^ 0.30 0.56 



2 0.38 



0.10 0.26 
0.42 0.60 



I 0.28 0.40 0.46 o!^8 



0.69 0.51 0.64 



^ 0.68 0.78 0.5'* o!65 



Note. "W" was most severe range restriction condition. "N" was no 
rangp restriction condition. "A" indicates majority group, "B' 
indicates minority group. 
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Table 5 

Correlations Between True and Detected Bias: RMSD Index 

10 Biased Items t^o Biased Items 

l^JA^""^ ^"^ '^^""^ ^° '~27"temr~ 

A (n=1000) 

W B (n=1000) 0.0^ 0.15 0.^9 0 33 

! 0.^2 0.« ' 0 ^3 

S '"=300' O-S'* 0.7^ 0.5B 0 70 

X B (n=1000) 0.30 0.29 C.i^B 0 35 

B (n=600) 0.13 0.35 0.3^ ' 0.16 

B (n=300) 0.11 0.08 0.^^ 0 37 

Y B (n=1000) 0.39 o.70 0.52 0.65 

B (n=600) 0.25 0.^7 o.23 0.36 

2 B (n=1000) 0.31 0.71 0.51 0.80 

B ;n=600) 0.15 0.77 -0.12 0.28 

B (n=300) 0.30 0.79 0.^9 0 73 

N B (n=1000) 0.16 o.^7 o.03 0.10 

B n=600) 0.32 0.70 0.i6 0.16 

^ <n=300) 0.29 0.5^ 0.01 0.19 

A (n=600) 

" I 0-20 0.31 

Y ^ "=300) 0.33 0.51 o.M 0.50 
X B (n=600) 0.10 0.50 0.29 0 29 

B (n=300) 0.00 0.03 0.39 0.29 

Y B (n=600) 0.37 o.6^ 0.56 0.60 

B (n=300) 0.19 0.^0 0.^7 0.^6 

2 B (n=600) 0.13 0.^6 0.29 0.5^ 

I ;"=300) 0.^0 0.63 0.^^ 0.5^ 

N B (n=600) 0.30 0.80 0.12 0 25 

B <n=300) 0.25 0.68 -0.05 0.2^ 

A 8, B (n=300) 

^ -0.05 0.10 0.05 o.lt* 

X 0.06 -0.02 CiB 0.^6 

I 0-10 0.18 0.37 0.20 

I 0.0^ 0.30 0.23 o.-a'i 

^ 0.55 0.83 0.0^ 0.27 

Not£- "W" was most severe range restriction condition. "N" was no 

range restriction condition. "A" indicates majority group, "B" 
indicates minority group. 
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Table 6 

Correlations Between True and Detected Bias: Lord's Chi-Square Index 

10 Biased Items ^0 Biased Items 



Condi t ion 
A (n=1000) 



30 items 



27 items 



50 items 



27 items 



W B Cn' 
B (n' 
B (n: 

X B (n= 
B (n= 
B (n= 

Y B (n« 
B (n= 
B (n= 

Z B (n= 
B (n= 
B (n= 

N B (n= 
B (n= 
B (n= 

A (n=600) 



=1000) 
=600) 
=300) 
=1000) 
=600) 
=300) 

a 000) 

:600) 
^300) 
1000) 
600) 
=300) 
1000) 
600) 
300) 



W B (n=600) 

B (n=300) 

X B (n=600) 

B (n=300) 

Y B (n=600) 

B (n=300) 

I B (n=600) 

B (n=300) 

N B (n=600) 

B (n=300) 

A 8. B (n=300) 

W 

X 
Y 
2 

N 



0.11 



0.5^f 



0.56 



0.54 



0.61 



0.06 
0.05 

0.17 

0.55 
0.35 

0.45 
0.45 

0.71 
0.48 



-0.06 



-0.05 



0.53 



0.59 



0.53 



0.80 



0.22 



0.53 



0.50 



0.59 



-0.07 
0.10 

-0.22 
0.34 
0.52 



0.10 



0.53 



0.64 



0.73 



0.79 



0.10 
0.87 

0.54 
0.23 

0.61 
0.52 

0.71 
0.72 

0.92 
0.65 



-0.22 



-0.12 



0.67 



0.77 



0.63 



0.94 



0.23 



0.67 



0.59 



0.72 



-0.18 
0.08 

-0.34 
0.43 
0.57 



0.21 



0.38 



0.46 



0.46 



0.28 



0.27 
0.23 

0.21 
0.24 

-0.01 
0.32 

0.19 
0.40 

0.19 
0.40 



0.27 



0.25 



0.16 



0.56 



0.32 



0.11 



0.33 



0.35 



0.39 



0.26 



0. 15 
0.33 
0.30 
0.40 
0.22 



-0.07 



0.13 



0.37 



0.53 



0.30 



0.29 
0.36 

-0.06 
0.14 

0.03 
0.24 

0.25 
0.44 

0.18 
0.19 



0.35 



0.21 



0.10 



0.55 



0.26 



0.11 



0.27 



0.32 



0.36 



0.29 



0.21 
0.22 
0.12 
0.3^ 
0.20 



Note, 



"W" was most severe range restriction condition. "N" was no 
range restriction condition. "A" indicates majority group, "B* 
indicates minority group. 
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Table 7 



Average Root Mean Square Errors 
Across 50 items 



Sample Size R Group A Group B(IO) Group B(^0> 



u 


0.2909 


0.1328 


0.1263 


X 


0.3138 


0.1207 


O.lkSB 


Y 


0.2533 


0.0932 


0.10^16 


2 


0.177'f 


0.0591 


0.055^1 


N 


0.1200 


0.0550 


0.0^119 



M 


0.3286 


0.1318 


0. 1509 


X 


0.3112 


0.0777 


0.12't6 


Y 


0.2638 


0.0638 


0,1176 


Z 


0.1712 


0.1076 


0.0530 


N 


0.1109 


0.0387 


0.0390 



U 


0.3057 


0.2329 


0. 1998 


X 


0.3113 


0.0921 


0.1218 


Y 


0.3230 


0.0970 


0.0727 


2 


0.1950 


0.0575 


0.0560 


N 


0.1387 


0.0^191 


0.0570 



Note. Column R denotes the restriction condition with 
"W" being the most severe and being no range 
restriction. Group B(10> denotes a test containing 
10 biased items. Group Bi^O) denotes a test containing 
^0 biased items. 



